Search CORE

116 research outputs found

Convergence Rate of Frank-Wolfe for Non-Convex Objectives

Author: Lacoste-Julien Simon
Publication venue
Publication date: 01/06/2016
Field of study

We give a simple proof that the Frank-Wolfe algorithm obtains a stationary point at a rate of

O(1/\sqrt{t})

on non-convex objectives with a Lipschitz continuous gradient. Our analysis is affine invariant and is the first, to the best of our knowledge, giving a similar rate to what was already proven for projected gradient methods (though on slightly different measures of stationarity).Comment: 6 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

PAC-Bayesian Theory Meets Bayesian Inference

Author: Bach Francis
Germain Pascal
Lacoste Alexandre
Lacoste-Julien Simon
Publication venue
Publication date: 27/05/2016
Field of study

We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks.Comment: Published at NIPS 2015 (http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

Author: Lacoste-Julien Simon
Leblond Rémi
Pedregosa Fabian
Publication venue
Publication date: 05/11/2017
Field of study

Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as the Lasso, group Lasso or empirical risk minimization with convex constraints. In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse method inspired by SAGA, a variance reduced incremental gradient algorithm. The proposed method is easy to implement and significantly outperforms the state of the art on several nonsmooth, large-scale problems. We prove that our method achieves a theoretical linear speedup with respect to the sequential version under assumptions on the sparsity of gradients and block-separability of the proximal term. Empirical benchmarks on a multi-core architecture illustrate practical speedups of up to 12x on a 20-core machine.Comment: Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

Author: Bach Francis
Defazio Aaron
Lacoste-Julien Simon
Publication venue
Publication date: 01/11/2014
Field of study

In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates. SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser. Unlike SDCA, SAGA supports non-strongly convex problems directly, and is adaptive to any inherent strong convexity of the problem. We give experimental results showing the effectiveness of our method.Comment: Advances In Neural Information Processing Systems, Nov 2014, Montreal, Canad

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Frank-Wolfe Algorithms for Saddle Point Problems

Author: Gidel Gauthier
Jebara Tony
Lacoste-Julien Simon
Publication venue
Publication date: 25/10/2016
Field of study

We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We also survey other convergence results and highlight gaps in the theoretical underpinnings of FW-style algorithms. Motivating applications without known efficient alternatives are explored through structured prediction with combinatorial penalties as well as games over matching polytopes involving an exponential number of constraints.Comment: Appears in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017). 39 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Rethinking LDA: moment matching for discrete ICA

Author: Bach Francis
Lacoste-Julien Simon
Podosinnikova Anastasia
Publication venue
Publication date: 05/11/2015
Field of study

We consider moment matching techniques for estimation in Latent Dirichlet Allocation (LDA). By drawing explicit links between LDA and discrete versions of independent component analysis (ICA), we first derive a new set of cumulant-based tensors, with an improved sample complexity. Moreover, we reuse standard ICA techniques such as joint diagonalization of tensors to improve over existing methods based on the tensor power method. In an extensive set of experiments on both synthetic and real datasets, we show that our new combination of tensors and orthogonal joint diagonalization techniques outperforms existing moment matching methods.Comment: 30 pages; added plate diagrams and clarifications, changed style, corrected typos, updated figures. in Proceedings of the 29-th Conference on Neural Information Processing Systems (NIPS), 201

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

On the Equivalence between Herding and Conditional Gradient Algorithms

Author: Bach Francis
Lacoste-Julien Simon
Obozinski Guillaume
Publication venue
Publication date: 01/01/2012
Field of study

We show that the herding procedure of Welling (2009) takes exactly the form of a standard convex optimization algorithm--namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. The experiments indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server